Robust Speaker Localization Guided by Deep Learning-Based Time-Frequency Masking

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning the Time-delay Manifold for Robust Speaker Localization

We present an algorithm for high dimensional density estimation which is efficient (both computationally and statistically) when the distribution is concentrated close to a low dimensional smooth manifold. The algorithm uses several random projections to generate a hierarchical mixture of Gaussians which rapidly converges to the underlying manifold. We use this algorithm to perform robust estim...

متن کامل

Localization based stereo speech source separation using probabilistic time-frequency masking and deep neural networks

Time-frequency (T-F) masking is an effective method for stereo speech source separation. However, reliable estimation of the T-F mask from sound mixtures is a challenging task, especially when room reverberations are present in the mixtures. In this paper, we propose a new stereo speech separation system where deep neural networks are used to generate soft T-F mask for separation. More specific...

متن کامل

Robust speech separation using time-frequency masking

A multi-microphone time-frequency speech masking technique is proposed. This technique utilizes both the timefrequency magnitude and phase information in order to estimate the Signal-to-Noise Ratio (SNR) maximizing masking coefficients for each time-frequency block given that the direction (or alternatively, the time-delay of arrival) of the speaker of interest is known. Using this masking algo...

متن کامل

Robust digit recognition using phase-dependent time-frequency masking

A technique using the time-frequency phase information of two microphones is proposed to estimate an ideal timefrequency mask using time-delay-of-arrival (TDOA) of the signal of interest. At a signal-to-noise ratio (SNR) of 0dB, the proposed technique using two microphones achieves a digit recognition rate (average over 5 speakers, each speaking 20-30 digits) of 71%. In contrast, delayand-sum b...

متن کامل

Time-frequency masking for large scale robust speech recognition

Time-frequency mask estimation has shown considerable success recently. In this paper, we demonstrate its utility as a feature enhancement frontend for large vocabulary conversational speech recognition. Additionally, we investigate how masking compares with feature denoising, which directly reconstructs clean features from noisy ones. We train a mask estimator that predicts ideal ratio masks. ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE/ACM Transactions on Audio, Speech, and Language Processing

سال: 2019

ISSN: 2329-9290,2329-9304

DOI: 10.1109/taslp.2018.2876169